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BUS TRANSACTION REORDERING IN A In the case of data tenure during address ^'Iratjon the 

rrvMWITFR SYSTEM HAVING CPU arbitrates for mastership of the data bus. After the CPU 

TnSfrED SLAVES « the bus master, during the data transfer phase, it samples 

UNORDERED bLAV 1* ^ ^ ^ ^ ^ Qr drives , he data bus for 

FIELD OF THE INVENTION 5 0 p era tions. Data termination signals occur in the data 

The present invention relates to computer architecture, in termination phase. Data termination signals are required 

particular to computer architecture for small computer sys- after each data beat in a data transfer. In a single-beat- 

tems such as personal computers. transaction, the data termination signals also indicates the 

c-^a^ ™ -ruc adt end of the tenure, while in burst accesses, the data termi- 

STATE Or IHb AKl jq natkm sigalAs apply , 0 individual beats and indicate the end 

The PowerPC computer architecture, co-developed by of ^ tenure 0 nly after the final data beat. 

Apple Computer, represents a departure from prior- Address-only transfers use only the address bus, with no 

generation small computer architectures. PowerPC ^ , rans{er involved. This feature is particularly useful in 

machines currently sold by Apple are based largely on the mult i. mast er and multiprocessor environments, where exter- 

Motorola MPC601 RISC microprocessor. Other related JS ^ rf 0D . chip prim ary caches and TLB (translation 

processors, including the MPC 604, MPC 603, MPC 603e, look . asid e buffer) entries is desirable. Additionally, the 

and MPC 602 are currently available and additional related MPC60x pro vides a retry capability that supports an eflgcient 

processor including the MPC 620 will be readily available in "snooping" protocol for systems with multiple memory 

the future. The MPC60x permits separate address bus ten- systems (including caches) that must remain coherent, 

ures and data bus tenures where tenure is defined I as .the 2Q transactions, while they do not 

period of bus mastership. In other words ratterthan con- £ S Memory latency, can greatly improve 

sidering the system bus as an indiveible resource and ™^ ive y bus . m throughput. The MPC60x bus proto- 

arbitrating for access to the entire bus, the address and data efe^e bus memory g p ^ ^ ^ 

buses are considered as separate resource^ and arbUration «J Joes not^ ^ multiple 

for access to these two buses may be P erfo ^,ndepen- a ^^f^ » w "ch multiple devices must com- 

dently. A transaction, or complete exchange between two JJJJ^ , * em bus> external ^lion is required. Tfle 

bus devices, is minimally compnsed of an address tenure ^ must co ' nt rol the pipeline depth and synchro- 

one or more data tenures may also be involved in an 6 ^ luai , „i* c 

exchange ^ere are two kinds of transactions: address/data nation between masters and slaves 

a ^cLniv In a traditional pipelined implementation, data bus tenures 

ana aaaress-oniy. u . . t , 30 are keT)t m str i ct order with respect to address tenures. 

wbk* a sinjlo Pie« of dot. is mnfcrod, sod buisl da. sod »« ra»rposoKi s«oh b.iaw.it. Tta otcjiiwrne sup 

that both consist of three phases-arbitration transfer, and transactio ns A, B, and C, then it will respond 

termination. FIG. L shows a data transfer that consists of a « and C third. If a master performs 
single-beat transfer (up to 64 bits). In a four-beat burst . J^JgJ^ E> and F , the n it expects servicing of those 

transfer, by contrast, data termmat.on signals are required ™~ • ; ' rf „ fi P E and p third . 

for each beat of data, but re-aro,tration is not required. 45 IT™ 8 ," , number of outstanding 

Having independent address and data tenures auows address ^^J^^tSfa the architecture at one time* 

pipelining (indicated in FIG. 1 by the fact that the data tenure ^SSSS^m^ this selected number is three 

begins before the address tenure ends) and spin-bus trans- "J^^ transact i ons As a result, in the foregoing 

actions to be implemented at the system level. Address P an ansio „ bridge may concurrently have 

pipelining allows new address bus ^ansacuonsm begin 50 jJSXig £c transaction to i, and one outstanding 

before the current data bus transaction has finished by ° transac fion from it. Although ordered masters and 

overlapping the data bus tenure associated with a previous ™£ ransa ^ ^ 

address bus tenure with one or more successive address suves, s PP° architecture, they can 

tenures. Split-bus transaction capability allow the address "^J^JgJ when mere y m conflicting completion 

bus and data bus to have different masters at the same time. s5 ' ead 10 a **^ * 

Fnr clariiv the basic functions of address and data tenures P 

f or ciamy, uie Dasu. luuwiuus ui Deadlock occurs in a computer system when one resource 

will be discussed in somewhat greater detail. cannot compl^ an c^ to another resource, and the 

In the case of address tenure, during address arbitmbon cannot compte transactions 

Midressbusiitalritioos^lsart^togOTmMleiAiprf ™JJ Uvelock occurs in a computer system when one 

the address bus. Assuming the CPU to be the bus master it 60 k|e „ acc J t0 J other resourC e ( 

then transfers the address on the address bus during the SSTrSSes *™ Performing transactions on 

address transfer phase. The address signals, together with ™ can be made due to the 

certain transfer attribute signals discussed m greater detail »» Dus b ° ul n ° "J*' rn l,_lete its access 

hereinafter, control the address transfer. After the address ^source s inability to complete its access 

transfer phase, the system uses the address termination 65 Due to the plethora of design methodologies and imple- 

pbase to signal that the address tenure is complete or that it mentations utilized by expansion card vendors systems are 

must be repeated. most prone to deadlocks and hvelocks when there is an 
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expansion bridge in ibe system. Some potenliaJ deadlocks 
may be detected and prevented at the bridge level; however, 
other pieces of the overall solution may need to be imple- 
mented at a higher level in system arbitration. 

The main reason that a deadlock or livelock occurs is that 
each of two different resources that communicate with each 
other- assumes -that- it -has top priority in the system. 
Unfortunately, when they communicate with each other this 
causes a conflict, and if one does not back off its access, the 
end result is deadlock or livelock. 

In the architecture of certain Power PC computers of the 
assignee, the top priority bus is known as the ARBus; it is 
the one bus assumed to never have to back off an access. 
However, there may be a need for the ARBus to communi- 
cate with an ISA bus behind an expansion bridge. As history 
recalls, the ISA bus design assumed that any initiated access 
would complete; therefore, an ISA master would not have to 
back off its access. Therein lies the problem. The Power PC 
architecture, in one instance, chose the ARBus to be the bus 
to not back off, and the PC-world chose the ISA bus to be 
the bus to not back off. This conflict of interest could result 
in deadlock. 

In another instance, the Power PC architecture may incor- 
porate a PQ bus-to-PCI bus ("PC12PCI") bridge having an 
interlocking behavior that disallows access to its slave port 
on one side of the PC12PCI bridge while its master on the 
same side of the PCI2PCI bridge has a transaction to 
perform. This behavior also means that the PCI2PC1 bridge 
assumes that it does not have to be backed off, and any 
communication between the ARBus and a target behind the 
PCI2PCI bridge could result in deadlock. 

Although decoupling the address and data buses in a 
computer system enables bus utilization to be greatly 
increased, it would be desirable to further increase bus 
utilization beyond what can reasonably be achieved in a 
system having both ordered masters and ordered slaves. 
Especially desirable would be a computer architecture in 
which bus utilization is increased and in which deadlocks 
are more readily avoided. 

SUMMARY OF THE INVENTION 

A mechanism is provided for reordering bus transactions 
to increase bus utilization in a computer system in which a 
split-transaction bus is bridged to a single-envelope bus. In 
one embodiment, both masters and slaves are ordered, 
simplifying implementation. In another embodiment, the 
system is more loosely coupled with only masters being 
ordered. Greater bus utilization is thereby achieved. In 
accordance with one embodiment of the invention, a queu- 
ing structure includes multiple master queues and multiple 
slave queues. The queuing structure receives bus grant 
signals and respective slave acknowledge signals from 
respective slave devices. Each time an address bus grant is 
issued a record is entered in the queuing structure, the record 
comprising a first entry in a master queue identified by the 
address bus grant signals, and a second entry in a slave 
queue identified by the slave acknowledge signals. The first 
entry identifies a target slave device in accordance with the 
slave acknowledge signals, and the second entry identifies 
an originating master device in accordance with the address 
bus grant signals. A matching circuit is responsive to queue 
entries from the queuing structure for producing match bits 
identifying selected records the first entry of which is at the 
head of a master queue. A data arbitration circuit is respon- 
sive to the match bits and to queue entries from the queuing 
structure for generating data bus grant signals for the master 
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devices and for generating for each slave device a multibit 
signal which when active identifies a transaction within the 
transaction queue of the slave device. 

5 DESCRIPTION OF THE DRAWINGS 

The present invention may be further understood from the 
following description in conjunction with the appended 
drawing. In the drawing: 

10 FIG. 1 is a diagram illustrating overlapping tenures for a 
single-beat transfer on a conventional MPC601 bus; 

FIG. 2 is a system-level block diagram of a computer 
system in which the present invention may be used; 

FIG. 3 is a block diagram of the memory controller 300 
35 of FIG. 2; 

FIG. 4 is a timing diagram showing conventional usage of 
the MPC601 bus; 

FIG. 5 is a timing diagram showing usage of the ARBus 
(a superset of the MPC601 bus) in the high-performance 
computer architecture of FIG. 2; 

FIG. 6 Is a block diagram of the arbiter 600 of FIG. 3; 

FIG. 7 is a block diagram of the expansion bridge 700 of 
FIG. 2; 

25 FIG. 8 illustrates a deadlock in which an ARBus master 
read of an expansion bridge is followed by an ARBus master 
read of memory; 

FIG. 9 illustrates a deadlock in which an ARBus master 
read of an expansion bridge is followed by an ARBus master 
30 L2 hit or allocate operation; 

FIG. 10 illustrates a deadlock in which a processor read 
of an expansion bridge is followed by a processor write to 
that expansion bridge; 

35 FIG. 11 illustrates a deadlock in which a Bus Grant signal 
and an Address Retry signal occur concurrently; 

FIG. 12 illustrates a deadlock in which a Bus Request 
signal and an Address Retry signal occur concurrently; 

FIG. 13 illustrates a deadlock in which expansion bridges 
40 read each other concurrently; 

FIG. 14 illustrates a deadlock in which one master 
attempts to read both expansion bridges; 

FIG. 15 illustrates a deadlock in which an ISA bus master 
45 reads a target behind an opposite expansion bridge; 

FIG. 16 illustrates a deadlock in which a PCI bus master 
read gets stuck behind a posted PCI bus master write; 

FIG. 17 illustrates a deadlock in which the ARBus trans- 
action limit is bit, and accesses cannot complete; 

50 FIG. 18 illustrates a deadlock in which one expansion 
bridge, with an outstanding ARBus read, accepts a read from 
another expansion bridge; 

FIG. 19 is a block diagram of another embodiment of the 
arbiter 600 of FIG. 3; 

55 FIG. 20 is a block diagram showing the input and output 
signals of the ArbMux 603' of FIG. 19; 

FIG. 21 is a block diagram showing the input and output 
signals of the ArbMux 603' of FIG. 19 in greater detail; 

FIG. 22 is a block diagram showing the input and output 
signals of the ArbDatSM 604* of FIG. 19; 

FIG. 23 is a block diagram of a bit filter portion of the 
ArbDatSM 604' of FIG. 19; 

FIG. 24 is a block diagram showing the input and output 
65 signals of the ArbDatSM 604' of FIG. 19 in greater detail; 
FIG. 25 is a block diagram showing the input and output 
signals of the ARtryGen block 613' of FIG. 19; and 
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FIG. 26 is a block diagram showing the input and output logic. Both the main memory controller 302 and the cache/ 

signals of the ARtryGen bock 613* of FIG. 19 in greater ROM controller 305 exchange control signals with the 

detail; arbiter 600, which executes overall control of the memory 

controller 300 and which is more particularly the subject of 

DETAILED DESCRIPTION OF THE 5 the following description. 
INVENTION 

The arbiter 600 includes a register file (not shown) that 
In the following description, the system architecture of a mav De written and read by the CPU 203 across the register 
computer system in which the present invention may be used ^ ata DUS 217. The register file includes, in addition to 
will first be described, including a description of the numerous base address registers, various ID, configuration 
MPC601 bus, the ARBus, which is a superset of the 10 and timing registers. The particulars of these registers are 
MPC601 bus, a system arbiter and an expansion bridge. DOt essential to an understanding of the present invention 
Deadlock avoidance will then be described, beginning with and ^ ™ { be further described. The arbiter 600 inputs 
a description of the types of deadlocks and livelocks that various control signals from and outputs various control 
may occur in the system, followed by a description of signals to a control bus 309. Some of the control signals 
specific deadlock and livelock situations for both a system 15 carried by the control bus 309 are part of the conventional 
having a single expansion bridge and a system having two PowerPC 601 microprocessor interface. The majority of the 
or more expansion bridges. Rules will be identified for signals carried by the control bus 309, however, are side- 
avoiding deadlock. These rules will then be summarized, band information signals used in accordance with the 
both for the case of a single expansion bridge and for the present invention to independently control the address bus 
case of two or more expansion bridges. Finally, the manner 20 206 and the data bus 205. 

in which the rules are implemented in the system will be Prior to describing in detail the manner in which these 

described. side-band information signals are used to decouple the 

Referring now to FIG. 2, the present invention may be address bus 206 and the data bus 205, it will be useful to 

used in a computer system of the type shown. A CPU 203 consider what is termed herein conventional usage of the 

(for example a Power PC 601 microprocessor) is connected 25 PowerPC 601 microprocessor interface, 

to a system bus 204, including a data bus 205, an address bus As shown in FIG. 1, address tenure and data tenure both 

206, and a control bus (not shown). A memory subsystem have arbitration, transfer and termination phases. Each of 

208 includes, in the illustrated embodiment, a main memory these phases involves the exchange of respective handshak- 

209, a read-only memory 211, and a level-two cache ing signals. Referring to FIG. 4, the handshaking signals that 

memory 212. The CPU 203, through the system bus 204, is characterize the address arbitration phase are a bus request 

connected directly to the level-two cache memory 212. The signal BR and a bus grant signal BG. The bus request signal 

CPU 203 is connected indirectly to the main memory 209 BR is an output signal of the CPU 203. The bus grant signal 

and the read-only memory 211, through a datapath circuit is an input signal of the CPU 203 and is output by the arbiter 

221 and a memory controller 300. In general, the datapath 600. Both the bus request signal BR and the bus grant signal 

circuit 221 provides for 64- or 128-bit reads from and writes BG relate to the address bus 206, When the CPU 203 has 

to memory, in either big-endian or little-endian mode. The received the bus grant signal BG, it is free to enter the 

memory controller 300 controls the various memory devices address transfer phase. 

within the memory subsystem 208 in response to signals on During the address transfer phase, a transfer start signal 

the system bus 204 and, in particular, provides address and ^ TS is asserted by the CPU 203 when the CPU 203 begins to 

control signals (i.e., RAS and CAS) to the main memory drive the address bus 206. The address is decoded by a slave 

209. The datapath circuit 221 and the memory controller 300 device as belonging to that address, i.e., falling within the 

are connected by a register data bus 217. device's assigned address space. During the address tenni- 

Also shown is an optional secondary processor 218 nation phase, the slave device asserts the address acknowl- 

which, like the CPU 203, may be a Power PC 601 micro- 45 edge signal AACK after it has sampled the address on the 

processor for example. address bus 206. 

The system bus 204 is also connected to an expansion bus During the address transfer phase, certain transfer 

bridge 219 (possibly more than one) and, optionally, a video attribute signals are used indicate the nature of transaction, 

bus bridge 220. In a preferred embodiment, the system bus including whether the transaction is an address-only trans- 

-204 is a superset of the conventional Power PC 601 micro- 50 action. Assuming that the transaction is not, then the transfer 

processor interface referred to herein as the Apple RISC start signal TS is treated by the arbiter 600 as an implicit data 

Bus, or ARBus. An expansion bus connected to the expan- bus request, starting the data arbitration phase. Following 

sion bus bridge 219 may be a standard PCI bus. Likewise, assertion of the acknowledge signal AACK, a data bus grant 

a video bus connected to the video bus bridge 220 may be signal DBG is asserted by the arbiter 600 once the data bus 

a PCl-like bus. 55 205 is available for use by the CPU 203. The CPU 203 may 

Referring to FIG. 3, the memory subsystem 208 including be & n tne data transfer phase on the next cycle by 

the memory controller 300 of FIG. 2 are shown in greater driving the data bus 205. During a subsequent data termi- 

detail, with particular emphasis on the various signals input nation phase, the slave device asserts a transfer acknowledge 

to and output from the memory controller 300. The memory signal TA after it has sampled the data on the data bus 205. 

controller 300 includes a main memory controller 302, a 60 The foregoing sequence of operations is repeated for a 

cache/ROM controller 305, and an arbiter 600. The main second subsequent transaction. In FIG. 4, the transaction to 

memory controller 302 produces address and control signals which address and data information pertain is indicated in 

for the main memory 209 and includes a DRAM sequencer parentheses, i.e., transaction (1) and transaction (2). 

303 and certain memory address logic. The cache/ROM Note that in FIG. 4, address tenures and data tenures, 

controller 305 produces control signals for the level-two 65 although they may be pipelined, are tightly ordered. That is, 

cache memory 212 and the read-only memory 211 and data bus tenure on the system is granted in the same order 

includes a cache/ROM sequencer 306 and certain cache as address tenure is granted even if the address tenures are 
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granted to different masters. In precise terms* if TS(n) is for address termination phase, however, differs. The addressed 

Master A and TS(n+l) is for Master B, then DBG(n) will be slave asserts the AACK signal in the conventional manner, 

for Master A and DBG(n+l) will be for Master B. the AACK signal being used by the master. In parallel with 

This tight ordering of the conventional MPC601 bus may AACK, the addressed slave generates a SACK signal for use 

result in considerable system performance degradation, 5 by foe arbiter 600. The arbiter uses this information about 

especially as bus speed increases. A read transaction to an which slave has acknowledged in order to reorder transac- 

expansion-bus device, for example, will typically be high- toons 0Q foe system bus 204^ 

latency as compared to a main-memory read transaction. In the data arbitration phase, the data bus is granted to 
Tight ordering of address and data tenures results in such masters based on a priority ordering of masters, and is 
latency impacting the data bus. That is, even though another 10 granted to slaves based in part on the priority of the master 
transaction might be ready to use the data bus first, during of the transaction and in part on the availability of data from 
the latency period, it cannot because of the tight ordering of the slave. What may be considered in effect two sets of grant 
address and data tenures. If a system is to handle information signals are therefore defined, DBG[0:#Masters-l] for mas- 
streams having real-time constraints, such as video streams, ters and SSD[0;#Slaves-l] for slaves, 
it is important to ensure that the data bus is not unavailable 15 Assume, for example, that in FIG. 5 the first transaction 
for use during substantial periods of time; otherwise real- is a read by the CPU 203 from the expansion bus bridge 219 
time deadhnes may be missed, resulting in objectional and that the second and third transactions are writes to 
artifacts during presentation. memory from the video bus bridge 220. In general, video 
The architecture of the computer system of FIG. 2 transactions will be assigned a higher priority than transac- 
decouples address and data tenures such that data bus 20 tions by the CPU 203 because of the real-time requirements 
utilization is increased. This increase in data bus utilization of video transactions. Data bus grant signals are therefore 
aUows for higher real-time performance to be achieved. In issued to video bus bridge 220 for the first video transaction 
particular, the present invention allows for a true split-bus (2), which proceeds through the data transfer phase, and the 
architecture with ordered slaves and ordered masters. second video transaction (n), which also proceeds through 
"Ordered," in one usage, means each master and each slave 25 the data transfer phase. The CPU 203 will not be issued a 
has its own independent FIFO structure supporting data bus grant signal for its read from the expansion bus 
"ordered" service to transactions posted to it. If a slave bridge 219 until a read data acknowledge signal has been 
receives three transactions A, B, and C, the it will respond returned to the arbiter 600 from the expansion bus bridge 
to A first, B second, and C third. If a master performs 219. Then, the CPU 203 will be issued a data bus grant 
transactions D, E, and F, then it expects servicing of those 30 signal for its read and the expansion bus bridge 219 will 
transactions in the order of D first, E second, and F third. In simultaneously be issued a corresponding slave source-data 
one embodiment, there can be up to three outstanding signal causing it to present its data on the data bus 205 to be 
master/slave pair transactions at one time. sampled by the CPU 203. 

Referring briefly again to FIG. 3, the side-band informa- 35 As may be appreciated from the foregoing description, the 
tion signals carried by the control bus 309 are side-band data arbitration phase in accordance with the present inven- 
information signals used to decouple the address bus 206 tion is very different than in the conventional case. This 
and the data bus 205. These side-band information signals different manner of operation allows address and data ten- 
include, in addition to the bus request signal BR, the bus ures to be decoupled, increasing utilization of the data bus. 
grant signal BG and the data bus grant signal DBG of FIG. The data transfer and data termination phases, however, are 
4, corresponding signal for each master besides the CPU essentially the same as in the conventional case. 
203. Transaction reordering is controlled by the arbiter 600. 

In one embodiment, the system includes, besides the CPU The general characteristics of the arbiter 600 will first be 

203, four additional masters for up to a total of five masters: described, after which the arbiter 600 will be described in 

the CPU 203, the secondary processor 218 (if present), the 45 greater detail. 

expansion bus bridge 219, one additional expansion bus The basic behavior that the arbiter 600 guarantees is as 

bridge (if present), and the video bus bridge 220 (if present). follows: 

The control bus 309 therefore carries five bus request signals Any given ARBus master has its own address and data 

BR[0:4], five bus grant signals BG[0:4], and five data bus tenures strictly ordered. That is, DBG(n) always cor- 

grant signals DBG[0:4]. . _ 5Q responds to TS(n) and for a set of TS(n) and TS(n+l), 

In the same embodiment, the system includes six slaves: DBG(n) will always occur before DBG(n+l). 

the expansion bus bridge 219 (also a master), the additional Any given ARBus slave has its own data tenures strictly 

expansion bus bridge (also a master, if present), the video ordered. That is, SSD(n) always corresponds to TS(n) 

bus bridge 220 (also a master, if present), the main memory and for a set of TS(n) and TS(n+l), SSD(n) will always 

209, the read-only memory 211, and memory controller 55 occur before SSD(n+l). 

registers accessible via the register data bus 217. For each Data bus tenure is not necessarily granted on the ARBus 

slave, the control bus 309 carries three signals: a slave m the same order as address tenure is granted if the 

acknowledge signal SACK, a read data available signal address tenures are granted to different masters. That is, 

RDDA, and a source- or sink-data signal SSD. The control if TS(n) is for Master A and TS(n+l) is for Master B, 

bus 309 therefore carries six slave acknowledge signals eo DBG(n) may be for Master B and therefore DBG(n+l) 

SACK[0:5], six read data acknowledge signals RDDA[0:5], for Master A. 

and six source- or sink-data signals SSD[0:5]. t d t h e illustrated embodiment, the arbiter 600 supports 

The manner in which the foregoing signals are used to five logical masters. The five masters arbitrate for use of the 

decouple address tenures and data tenure may be appreciated bus in accordance with a fixed priority as follows: the video 

with reference to FIG. 5. For simplicity, the address arbi- 65 bus bridge 220, the expansion bus bridge 219, an additional 

tration phase has not been illustrated. The address transfer expansion bus bridge (if present), the CPU 203, and the 

phase is essentially the same as in the conventional case. The secondary processor 218. By giving highest priority to the 
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video bus bridge 220, the arbiter 600 allows the video bus Based on the foregoing input signals, the arbiter multi- 
bridge 220 to "hog" the ARBus. plexer 603 produces a slave match vector SlvMatcb and a 
The arbiter 600 may optionally "park" the CPU 203 or the slave read ready vector SlvRdReady. The slave match vector 
video bus bridge 220 on the ARBus by asserting the appro- SlvMatch designates those masters finding matching slaves, 
priate BG wire during idle bus cycles. The default mode of 5 }. e ., slaves expecting to next respond to transactions from 
operation is to park the most recent master. those respective masters. The slave read ready vector SlvR- 
Address bus arbitration occurs in every cycle that an dReady identifies, of those masters, which have slaves that 
address tenure is not active. Masters assert their individual afe actua lly rcac }y to source data. The slave match vector 
bus request signals (BR) to the arbiter 600 to signal a request SlvMatch and the slave read ready vector SlvRdReady are 
for service. The arbiter 600 signals the master which has JQ mpm tQ aQ data bus arbiter slate mac hine 604. 
won the arbitration by asserting bus grant (BG). Masters that ^ SACK vectors at the head of the master queues 601 
have BG asserted in a given cycle are free to assert TS and m ^ ^ {Q ^ da(a bug arbilef statc machinc 60 4. The 
therefore start a transaction in the next cycle. bus ^ macnine 604 determines which trans- 
it arbiter 600 controls the use of the data sisals as a examining the bits of the SlvMatch 
function of me address and the availably of jead htoujf prior it y order and, it a bit indicates a matching 
a given ^Bus address receives an AACK, Aearb iter 6W>, £ £ determining further whether either the 
bv sampling the SACK signals, knows which slave will uwaua/au™ H au » w & . . ..^ 
aLpTSdata or will return read data. A slave that asserts transaction is a write transaction (by examining the RoTWr 
AACK for a write transaction gives implicit permission to bits at the front master queue entries) or the corresponding 
the arbiter 600 to grant the data bus to the master and allow bit in the SlvRdReady vector is set, mdicatmg that toe slave 
it to assert the associated write data. Slaves must assert 20 is ready to source data. In Venlog notation, the data bus 
RDDA when requested return read data is available. arbiter state machine 604 computes a vector TransReady as 

The arbiter 600 grants the data bus to a selected master via follows: 

the assertion of DBG (Data Bus Grant) and indicates to the TransReady[0:4]«SlvMatch[0:4]& 

slave that data is to be asserted or accepted via the assertion ({5{Write}}ijSlvRdReady[0:4]) Based on the computed 
of SSD (Source or Sink Data). 25 TransReady vector, the data bus arbiter state machine 604 

Transactions which do not involve a data transfer asserts a corresponding one of the data bus grant signal 

(Address-Only transactions) are typically generated by the DBG. The data bus arbiter state machine 604 also asserts a 

CPU 203 or the secondary processor 218 and are simply corresponding one of the source-or-sink-data signals SSD, 

acknowledged (AACK asserted) by the arbiter 600. in accordance with the SACK vector at the front of the 

Referring now to FIG. 6, the arbiter 600 will be described 30 winning master queue, 

in greater detail. The arbiter 600 includes master queues 601, Operation of the arbiter 600 may be further understood 

one for each master in the svstem, and slave queues 602, one from the following illustrative examples, 

for each slave in the system. Each of the master queues 601 To take a relatively simple example, assume that Master 

are connected at their respective data inputs to a SACK 1 (the expansion bus bridge 219) issues a read transacuon to 
vector composed of the slave acknowledge signals SACK of 35 Slave 3 (the video bus bridge 220). Slave 3, when it is ready 

each of the slaves, in addition to a Rd/Wr signal. Hereinafter, to service the transaction, asserts the AACK signal on the 

the term "SACK vector" will be understood to mean signals ARBus and, at the same time, generates a SACK signal to 

including the slave acknowledge signals SACK of each of the arbiter 600 identifying Slave 3. When the arbiter 600 

the slaves and the Rd/Wr signal. Each of the slave queues receives the AACK signal, the SACK vector is pushed onto 

602 are connected at their respective data inputs to a BG 40 one of the master queues 601 based on the BG vector. At the 

vector composed of the bus grant signals BG of each of the same time, the SACK vector is pushed onto one of the 

masters (In more precise terms, the BG vector is the master queues 601 based on the BG vector. Assuming that 

physical bus grant signals sampled in the cycle that the TS no other transactions are presently queued, a SACK vector 

signal is asserted.) The bus grant signals BG are produced by value representing Slave 3 (for example bll 1011) will 

an address bus arbiter state machine 605 in response to the 45 appear at the head of the one of the master queues 601 for 

bus request signals BR of each of the masters. Master 1, and a BG vector value representing Master 1 (for 

Each time the address acknowledge signal AACK is example blOlll) will appear at the head of the one of the 

presented on the system bus 204, the master queues 601 and slave queues 602 for Slave 3. The arbiter muluplexer 603 

the slave queues 602 are updated by pushing the SACK will therefore cause the SlvMatch vector to have a value 

vector onto one (and only one) of the master queues 601 and 50 -indicating-a -match for Master 1 (for example D01000). When 

pushing the BG vector onto one (and only one) of the slave Slave 3 is ready with read data, it will assert its JWDA 

queues 602 In particular, the SACK vector is pushed onto signal, in response to which the arbiter multiplexer 603 will 

one of the master queues 601 identified by the BG vector, cause the SlvRdReady vector to have a value indicating the 

and the BG vector is pushed onto one of the slave queues readiness of Slave 3 (for example bOOlOO). If no other 

602 identified by the SACK vector. 55 transactions having higher priority have in the meantime 
The SACK vectors at the heads of the master queues 601 become ready to go, the data bus arbiter state machine 604 

and the BG vectors at the heads of the slave queues 602 are will then issue a data bus grant signal DBG to Master 1 and 

input to an arbiter multiplexer 603. The arbiter multiplexer a sink/source data signal SSD to Slave 3, and the data 

603 looks at the SACK vectors at the head of the master transfer phase of the transaction will proceed. 

queues 601 and determines which of the slave queues 602 60 To take another, more complex example, assume that after 

designated by the SACK vectors have at their heads a BG Master 1 has issued the foregoing transacUon request 

vector that designates the reciprocal one of the master (shown below as TransacUon 1) but before Slave 3 has 

queues 601. On the next data tenure of the masters for which responded with an RDDA signal, a series of further trans- 

this condition is satisfied, data will be sourced from the actions is issued, in accordance with the following chrono- 

corresponding slave. The arbiter multiplexer 603 also 65 logical sequence: 

receives a read-ready vector RDDA composed of the read 1. Master 1 Rd Slave 3 

data acknowledge signals RDDA of each of the slaves. 2. Master 3 Wr Slave 3 
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3. Master 3 Wr Slave 0 

4. Master 4 Rd Slave 1 

5. Master 2 Wr Slave 4 

Note that transactions 1 and 2 both involve Slave 3, and 
transactions 2 and 3 both involve Master 3. Because masters 
and slaves are ordered, data dependencies are created. That 
is, transaction 2 cannot complete until transaction 1 has 
completed. Similarly, transaction 3 cannot complete until 
transaction 2 has completed. Transactions 4 and 5, on the 
other hand, have no data dependencies. Transaction 4 is a 
read from Master 4 (CPU 1) to Slave 1 (ROM). In the case 
of ROM and RAM, because read latency is minimal and is 
know in advance, the RDDA signals for ROM and RAM are 
tied asserted. 

Transaction 2, Master 3's write of Slave 3, is queued up 
behind Master l's read of Slave 3. Transaction 3, Master 3's 
write of Slave 0, is queued up behind Master 3's write of 
Slave 3. When transaction 4 is queued, there are matching 
queue entries at the bead of the master and slave queues for 
transactions 1 and 4. Transaction 1, however, is a read 
transaction and is not allowed to proceed until an RDDA is 
received from Slave 3. 

Therefore, the arbiter 600 first grants the data bus to 
Master 4 and Slave 1 for transaction 4. When transaction 5 
is queued, there are matching queue entries at the head of the 
master and slave queues for transactions 1 and 5. Assume, 
however, that an RDDA has still not been received from 
Slave 3. The arbiter 600 will then grant the data bus to 
Master 2 and Slave 4 for transaction 5. 

Assume now that an RDDA is received from Slave 3. 
Transactions 1, 2 and 3 will then, in that order, be granted the 
bus and will complete. In the foregoing example, whereas 
the address order of the transactions is 1, 2, 3, 4, 5, the data 
order is 4, 5, 1, 2, 3. 

When the system is totally idle, i.e., the data bus is not 
busy and all queues are empty, a CPU memory read trans- 
action is executed immediately without queuing the trans- 
action. 

The expansion bridge responds to transactions on the 
ARBus and PCI Bus and forwards them to the "other" bus 
appropriately. The primary function of the expansion bridge 
is to map transactions from one bus to the other. The job of 
the expansion bridge to transfer data between the ARBus 
and the PCI Bus is complicated by the fact that the ARBus 
and the PCI Bus are very different in a number of respects 
as shown in the following table: 

TABLE 1 



BUS 



BUS CHARACTERISTIC ARBUS 



pa BUS 



ADDRESS/DATA TENURES Full split transaction Single envelope 



(pended) 
ENDIANESS Big endian 

CYCLE TYPES One cycle type 
TRANSACTION LENGTHS Fixed (3.2-byte) 

burst length . 



BUS SPEED 



Up to 50 MHz 



(non -pended) 
Little endian 
Many cycle types 
Arbitrary length 
transactions with 
byte-enabled writes. 
33 MHz 



The PowerPC architecture and the ARBus do not "natu- 
rally" generate many types of cycles that are required by the 
PG specification. These unique PCI Bus cycles are included 
in the PCI specification to provide backwards compatibility 
for x86/ISA/IBM PC-AT cards and software. The expansion 
bridge provides facilities for generating PCI Bus configu- 
ration cycles, I/O cycles and PCI "Special Cycles"/ 
"Interrupt Acknowledge" via special address spaces. 



Referring now to FIG. 7, the expansion bridge 700 will be 
described in greater detail. The expansion bridge is con- 
structed with two main state machines for the ARBus and 
PCI Bus. The two main state machines actually consist of a 

5 number of smaller sub-state machines. These state machines 
operate in different clock domains and require that hand- 
shake signals be synchronized. Transactions passed between 
the ARBus and the PCI Bus are staged in a large packet- 
buffer structure. Data endian conversion is performed on the 

10 ARBus side of the packet buffer with data being stored in the 
packet buffer in PCI Bus Little Endian format. Address 
endian swizzling is performed on the master side of a 
transaction. For a master cycle to the PCI Bus from the 
ARBus, the address swizzling occurs on the ARBus side. 

15 For a master cycle to the ARBus from the PCI Bus, the 
address swizzling occurs on the PCI Bus side. 

As explained previously, systems are most prone to dead- 
locks and iivelocks when there is an expansion bridge in the 
system. In the description that follows, a deadlock will be 

20 introduced, together with its LockUp type (A, B, or C as 
described below), a solution for the deadlock, and where in 
the system the deadlock prevention logic preferably resides. 
Deadlock prevention rules assume a starting point behavior 
in which the expansion bridge allows concurrent reads 

25 through the bridge, and the ARBus arbiter performs the 
DBWO* protocol as necessary. The DBWO* protocol 
allows the Processor to re-order a write data phase around a 
read data phase for snoop pushes. 
An entire class of deadlocks and Iivelocks is related to the 

30 PCI Bus being stalled during reads. During a read, the PCI 
Bus can potentially remain stalled for micro-seconds at a 
time when the target of the read is on the other side of a 
bridge. For instance, a Master on PCI Bus 1 wants to read 
from a target behind a PCI2PCI bridge on PCI Bus 2. In this 

35 case the master incurs the latency of three bridges (a first 
expansion bus bridge, a second expansion bus bridge, and a 
PCI2PCI bridge) before actually reaching the target, and no 
other transactions can occur on PCI Bus 1 as long as the read 
is stalling the bus. If other transactions from the ARBus were 

40 able to get access to the PCI Bus and complete, then the class 
of deadlocks related to conflicting completion orders would 
disappear. This type of lockup is referred to herein as Type-A 
LockUp. 

Another class of deadlocks and Iivelocks is related to the 

45 ISA bus and PCI2PGI bridge behavior. When an ARBus read 
occurs to an ISA bus or a target behind a PCI2PCI bridge, 
it has no way of knowing whether it will complete or be 
blocked. A "block" can occur for the ISA bus if there is an 
ISA bus master already on the ISA bus with a pending 

50 transaction; this transaction may or may not require ARBus 
access. A "block" can also occur for the PCI2PCI bridge if 
the bridge has writes posted to it that it must perform on the 
host side of the PCI2PCI bridge before completing the read. 
In either of these two cases, there is an ARBus master that 

55 will wait forever for its read to either the ISA bus or 
PCI2PCI bridge to complete. If anything "blocks" the ISA 
bus or PCI2PCI bridge from completing its non-back-offable 
access, deadlock will occur. This type of lockup is referred 
to herein as Type-B LockUp. 

60 A third class of deadlocks and Iivelocks is related to the 
ARBus arbiter being fixed priority, and to cross- 
communication problems between devices on the bus who 
are both masters and slaves. Lower priority masters can be 
starved from gaining ownership of the ARBus when follow- 

65 ing the generic ARBus rules set forth for behavior following 
an ARBus ARTRY* . If in addition, the lower priority master 
is unable to accept transactions as a slave, deadlocks or 
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livelocks can occur. This type of lockup is referred lo herein itself: a) Read of Expansion Bridge 1, b) Read of main 
as Type-C LockUp. memory. Expansion Bridge 1 has ordered itself: a) Read of 
Deadlock avoidance is complicated by the fact that in main memory, b) Read by Master A. This is a Type-A 
some systems there may be more than one expansion bridge. LockUp. Since the TAG update must complete without 
Hence, deadlocks in a true split bus architecture having only 5 fa ture occurrences of TS_, the deadlock fix is to have the 
a single expansion bridge connected to the ARBus will be ARBus arbiter prevent an access by Master A that would 
considered first, followed by a consideration of deadlocks in cause a second level cache hit or allocate (via ARTRY*) 
a true split bus architecture having two expansion bridges. following Master A's read of an expansion bridge. 
Systems having more than two expansion bridges will not be Referring to FIG. 10, deadlock may occur when a pro- 
considered, although similar deadlock avoidance principles 10 cessor read o[ ™ e *P aflSIOfl b "$ e IS followed by a proces- 

may be applied to such systems. f or ™*? 10 . lhal «P™«» bnd 5 e - A ^ cal W <* 

/, . rr , ji i * i • transactions is as follows: 

Various deadlocks can occur with a single expansion , _ . , ... , , 

bridge in a system implemented with a split bus (ARBus), £ f0ces *? r 1 ' ea * ,a f ei ^Z™?? b "f * 

ordered masters, ordered slaves, and utilizing a fixed priori.? ™ P T, n ^ f J < I ? , J^. 

.... . r . *u Ann ^ completion in order to flush posted write data to target 

arbitration scheme for the masters on the ARBus. These is £ am of pci2Pcl bridge ^ * 

deadlocks can also occur m a dual expansion bndge system 0 n i t Jt ? ^ . , , 

% . . . . 4 . . , , P / . , 2. Processor 1 writes target behind Expansion Bndge 1. 

with the same characteristics, but only one expansion bridge ~ „ „ . . - .\ . & 

need be involved to cause the deadlock. 1 Ex P ans "> a Bnd g?_ 1 ™ te att ^P l * main memory 

t> r ' 4 tti/> o a ji i A „„ causes Processor 1 to attempt Snoop Push. 

Referring to FIG. 8, deadlock may occur when an ARBus ^ ?cma bfid faas b ^ fctedocked, and must 

master read of an expansion bndge is followed by an ARBus 20 fiush a (ed ^ upstream of ilself; ^ ^ casc ^ ^ 

master read to memory. Atypical sequence of transactions is ^ headed toward the ^ d Expans i on Bridge l's 

as follows: buffers are full and cannot currently accept the write. The 

1. PCI Bus 1 Master initiates read of main memory, and first two outstanding transactions in this scenario are 1) 
stalls PCI Bus 1. Master Processor 1 has an outstanding read of Expansion 

2. Processor 1 reads target behind Expansion Bridge 1 25 Brid g e followed by 2) Master Processor 1 has an out- 
(Expansion Bridge 1 AAck*s without ARTRY*). standing write to Expansion Bridge 1. The third attempted 

3. Processor 1 reads main memory (Memory Controller * a write cycle from Expansion Bridge 1 to main 
AAck*s without ARTRY*). memory. However, this wnte cycle is to copyback^acbeable 

. 7 space and causes a snoop hit in Processor l's cache. Pro- 

4. Expansion Bndge 1 forwards read of main memory x cessor i re tries Expansion Bridge l's write cycle, but now 
(Memory Controller AAck* s without ARTRY*). nee ds to push the dirty cache line to main memory. However, 

Master Processor 1 has ordered itself: a) Expansion at this point it is unable to push the dirty cache line due to 

Bridge 1, b) Read main memory. Slave main memory has its outstanding write to Expansion Bridge 1. \Wth the use of 

ordered itself: a) Read by Processor 1, b) Read by Expansion DBWO* , Processor 1 could have re-ordered the snoop push 

Bridge 1. PCI Bus 1 has an implied ordering of a) Read main 35 write transaction around its outstanding read of Expansion 

memory, b) Read by Expansion Bridge. PCI Bus 1 is stalled Bridge 1 (transaction number 1). However, the MPC60x 

by the read of main memory and will not get off the bus until microprocessor is not capable of re-ordering the snoop push 

the read has completed. In this case, the completion order of write transaction around its own outstanding write. This is a 

Master Processor 1 directly conflicts with completion order Type-B LockUp, caused by Processor l's inability to com- 

of PCI Bus 1. This is a Type-A LockUp. There are two ^ plete its read due to the PCI2PCI bridge's interlocking 

potential solutions: 1) Retry the Expansion Bridge 1 read of behavior. This deadlock is avofded by having the ARBus 

main memory, OR 2) Retry the Processor 1 read of main arbiter prevent the Processor from writing to an expansion 

memory. For reasons described hereinafter, Solution 2 is bridge if it has an outstanding read of the expansion bridge, 

preferred for ease of implementation. This deadlock is This will allow the Processor to perform the Snoop Push 

therefore avoided by having the ARBus arbiter prevent the 45 write transaction if required. 

Processor from reading main memory (via ARTRY*) fol- There is a set of deadlocks that only occur with more than 

lowing the Processor's read of an expansion bridge. one an expansion bridge in a system implemented with a 

Referring to FIG. 9, deadlock may occur when an ARBus split bus (ARBus), ordered masters, ordered slaves, and 

master read of an expansion bridge is followed by an ARBus utilizing a fixed priority arbitration scheme for the masters 

master L2 hit or allocate operation. A typical sequence of 5Q on the ARBus, In one particular system architecture, high to 

transactions is as follows: low priority is: 1) Video, 2) Expansion Bridge 1, 3) Expan- 

1. PCI Bus 1 Master initiates read of main memory, and sion Bridge 2, 4) Processor 1, 5) Processor 2. Deadlock rules 
stalls PCI Bus 1. described previously also apply to a multiple expansion 

2. Master A reads target behind Expansion Bridge 1 bridge environment. The following new rules are in addition 
(Expansion Bridge 1 AAck*s without ARTRY*). 55 to the previous rules. 

3. Master A issues memory read causing the L2 (second Referring to FIG. 11, deadlock may occur in the case of 
level cache) to allocate the cache line. concurrent Bus Grant and Address Retry signals. A typical 

4. Expansion Bridge 1 must complete its read of main sequence of transactions is as follows: 

memory, but it cannot complete. 1. Expansion Bridge 1 attempts a write to Expansion 

Because the TAG SRAMs utilize a latch to capture the 60 Bridge 2 but Expansion Bridge 2 buffers are full, 

address from the main Address Bus during a TS_, no future 2. Expansion Bridge 2 has a write to Expansion Bridge 1 

TS_ can occur until the completion of the TAG update. The and received Bus Grant during Expansion Bridge 1 

system arbiter prevents future TS_ events by deasserting all cycle. 

Bus Grants to Masters until the completion of the TAG 3. Expansion Bridge 2 ARTRY*s Expansion Bridge 1 due 

update. Unfortunately, in the scenario described above, the 65 to full buffers. As per ARBus specification, Expansion 

TAG update will not complete until the PCI Bus Master A Bridge 2 ignores its Bus Grant and does not take the 

read of main memory has occurred. Master A has ordered ARBus. 
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4. As per ARBus specification, following ARTRY* both accepts the read from the opposite expansion bridge. Neither 

Expansion Bridge 1 and Expansion Bridge 2 deassert of the accepted reads can complete because the buses they 

their Bus Requests for one clock. Both re-assert Bus are attempting to get onto are stalled. At least one of the 

Requests. Expansion Bridge 1 wins. The foregoing buses must free itself for this basic deadlock to be avoided; 

sequence of transactions is repeated indefinitely. 5 one expansion bridge must not accept the read, but must 

Following ARBus protocol after an ARtry*, a master who ARTRY* the read attempt to it. This is a Type-A LockUp, 

has a Bus Grant ignores it All masters must deassert their and fc avoided b havin an expansion bridge disallow a 

Bus Requests the dock following an ARtry* and then rcad of ^ sIave whi]e " has an outstandi masler read 

re-assert ihem. In a fixed priority arbitration scheme, the tenUfe (AAck , without ^^y.y Qnce tbe § ata bus t 

higher priority master will win everv time, and if it cannot , . > , « . • . , A 4 , 5 . 

complete its access, an ARBus livelock results. This is a 10 15 received corresponding to the address tenure then the 

Type-C LockUp, and is avoided by having the expansion transaction * guaranteed to complete and slave reads can be 

bridge disregard the ARBus protocol, and take the address aC ^f p j . 

tenure if a Bus Grant occurs during an ARtry* . An expansion Referring to FIG. 14, deadlock may occur in the case of 

bridge can do this without adverse side-effects because it is one m t aster a » em Pf^g *> *ad k>tb expansion bridges. A 

not a snooping bus master. 15 { W lcai sequ«K* 0 f transactions is as follows: 

Referring to FIG. 12, deadlock may occur in the case of 1- PCI Bus 1 Master initiates read of target behind 

concurrent Bus Request and Address Retry signals. A typical Expansion Bridge 2, and stalls PCI Bus 1. 

sequence of transactions is as follows: 2. Processor 1 reads target behind Expansion Bridge 1 

1. Video attempts a write to Expansion Bridge 1 but (Expansion Bridge 1 AAck*s without ARTRY*). 
Expansion Bridge 1 buffers are full; 20 3. Processor 1 reads target behind Expansion Bridge 2 

2. Expansion Bridge 1 has its Bus Request asserted (Expansion Bridge 2 AAck*s without ARTRY*). 
because it has a read of memory to perform, but Video, 4. PCI Bus 1 Master's read of target behind Expansion 
with multiple cycles to perform, keeps its Bus Request Bridge 2 occurs on ARBus (Expansion Bridge 2 
asserted. AAck*s without ARTRY*). 

3. Expansion Bridge 1 ARTRY*s Video due to full Master Processor 1 has ordered itself: a) Read Expansion 
buffers. As per ARBus specification, Expansion Bridge Bridge 1, b) Read Expansion Bridge 2. Slave Expansion 
1 and 'Video deassert their bus requests the clock Bridge 2 has ordered itself: a) Read by Processor 1, b) Read 
following ARTRY*. by Expansion Bridge 1. Expansion Bridge 1 has implied 

4. 'video and Expansion Bridge 1 reassert the bus requests. 30 ordering due to stalled PCI Bus of: a) Read of Expansion 
Since Video has a fixed higher priority than Expansion Bridge 2, b) Read by Processor 1. In this scenario, all three 
Bridge 1, it constantly gets Bus Grant. The foregoing devices involved have conflicting completion orders, 
sequence of transactions is repeated indefinitely. Although Processor l's read of the target behind Expansion 

Following ARBus protocol after an ARTRY*, all masters Bridge 2 can complete on PCI Bus 2, it cannot complete on 

on the bus deassert their Bus Requests to give the Processor 35 the ARBus until Processor l's read of Expansion Bridge 1 

a guaranteed window being the only bus requestor. This has completed. Expansion Bridge l's read of Expansion 

guarantees that the Processor, who normally has lowest Bridge 2 must complete before Processor l's read of Expan- 

ARBus priority, acquires the bus next in order to complete si <>n Bridge 1 can complete. Since Expansion Bridge 2 is 

a high priority transaction such as a Snoop Push. In this case, ordered to deliver the response to Processor l's read before 

the ARBus protocol causes the lower priority expansion 40 delivering the response to Expansion Bridge l's read, the 

bridge to never receive a Bus Grant due to the higher priority deadlock results. This is a Type-A LockUp, and is avoided 

Video requesting the ARBus to complete its access. Since Dv preventing one master from reading both an expansion 

the completion of the Video access is dependent on the bridges. This prevents the response ordering dependencies 

expansion bridge freeing up some buffer space, and since the f° r the master. 

expansion bridge must get the ARBus to complete its access 45 Referring to FIG. 15, deadlock may occur in the case of 

or receive an ARTRY* in order to free up PCI Bus 1 to free an ISA bus master reading a target behind an opposite 

up buffer space for the Video write to come in, the expansion expansion bridge. A typical sequence of transactions is as 

bridge effectively needs higher priority than Video this time. follows: 

This is a Type-C LockUp, and is avoided by having an 1. PCI Bus 2 Master reads ISA target behind Expansion 

expansion bridge keep its Bus Request asserted the clock 50 .Bridge 1, stalling PCI Bus 2 (Expansion Bridge 1 

following an ARTRY* if it is the source of the ARTRY*. AAck*s) 

This is precisely the protocol the MP60X processor per- 2. ISA Master on ISA initiates read of target behind 

forms to effectively achieve a higher priority when neces- Expansion Bridge 2. ISA Master cannot be backed off. 

sarv - 3. Expansion Bridge 1 forwards ISA Master's read to 

Referring to FIG. 13, deadlock may occur in the case of 55 Expansion Bridge 2. Expansion Bridge 2 retries Expan- 

expansion bridges reading each other concurrently. Atypical s i on Bridge 1 because PCI Bus 2 Master read is 

sequence of transactions is as follows: outstanding. This occurs indefinitely. 

1. A Master Behind Expansion Bridge 1 reads a target The fact that the master behind Expansion Bridge 2 got its 
behind Expansion Bridge 2 (Expansion Bridge 2 read AAck*ed by Expansion Bridge 1 on the ARBus prior to 
AAck*s) stalling PCI Bus 1. The read remains out- 60 the ISA bus master behind Expansion Bridge 1, implies that 
standing within Expansion Bridge 2. Expansion Bridge 2's completion order is: 1) Complete read 

2. A Master Behind Expansion Bridge 2 reads a target to ISA bus behind Expansion Bridge 1, 2) Accept incoming 
behind Expansion Bridge 1 (Expansion Bridge 1 read from Expansion Bridge 1 (or whomever). However, the 
AAck*s) stalling PCI Bus 2. The read remains out- ISA bus has initiated an access and will retry all accesses to 
standing within Expansion Bridge 1. 65 it until its read of the target behind Expansion Bridge 2 has 

This is the most basic deadlock case. Each expansion completed. The ISA bus completion order is: 1) Complete 

bridge has a stalled bus, and yet each expansion bridge read to target behind Expansion Bridge 2, 2) Accept incom- 



5,996,036 

17 18 

ing read from Expansion Bridge 2 (or whomever). These are lhai the PCI Bus Master on PCI Bus 2 has stalled its bus 

two masters have conflicting completion orders. Note that if with the read of the target behind Expansion Bridge 1 and 

PCI Bus 2 had not been stalled by its read and Expansion that the ISA bus master has stalled its ISA bus with the read 

Bridge 2 could have accepted the read from Expansion of main memory. If either bus were not stalled, then either 

Bridge 1, then all transactions would be able to complete. 5 the Processor 2 'read of the target behind Expansion Bridge 

This is a Type-A (PCI Bus 2 stall) and Type-B LockUp (ISA 2 would complete, or the Processor 1 read of the ISA target 

bus block). The fix is to allow ISA bus master cards to would complete. This is a Type-A (PG Bus 2 stall) and 

communicate only with main memory or targets behind the Type-B LockUp (ISA bus block): Since neither the PCI Bus 

same expansion bridge. For example, system software may 2 stall or the ISA bus block can be prevented, the deadlock 

remap accesses across the bridges to memory and complete is avoided by the ARBus arbiter to prevent Expansion 

transfers virtually. Bridge 2 from reading Expansion Bridge 1 if Expansion 

Referring to FIG. 16, deadlock may occur when a PCi bus Bridge 2 has an outstanding read. In general terms, if an 

master read gels stuck behmd a posted PCI bus master write, expansion bridge-A has an outstanding ARBus Master's 

A typical sequence of transactions is as follows: Slave Read ' lDeD tne ARBus arbiter should prevent 

1. Three transactions: a) Processor 1 Reads Expansion „ (ARTRY*) an expansion bridge-A from reading an expan- 
Bridge 1, b) Processor 1 Reads Expansion Bridge 1, c) 15 SK L n M * ™* lan * n f read has °™Pk«*- 

Processor 2 Reads Expansion Bridge 2. ReferT1D S <° ... ' dc f*°* m f D " ™ hcn °° e 

„■ w . . ^ . ^ . , f, expansion bridge, with an outstanding ARBus read, accepts 

2. Meanwhile: a) Expansion Bridge 1 has a write trans- a rcad from another expansioi) bridge . A typical sequence of 

action destined for Expansion Bridge 2, and a PCI Bus transactions is as follows* 

Master on PCI Bus 1 issues a read of memory, stalling 20 h Expansion Bridge 1 accepts two ARBus to PCI Bus 1 

PCI Bus 1, b) Expanswn Bridge 2 has a write trans- ^les. Meanwhile, a PCI Bus Master on PCI Bus 1 has 

action destined for Expansion Bridge 1, and a PCI Bus initiated a read access from a target behind Expansion 

Master on PCI Bus 2 issues a read of memory, stalling Bridge 2. 

PCI Bus 2. 2. Expansion Bridge 2 accepts a read from Processor 2 to 

The normal means to get a PCI Bus Master read to free up 25 the PCI2PCI bridge, followed by a read from Expan- 

tbe PCI Bus is to retry a transaction from the PCI bus when sion Bridge 1. Meanwhile, Expansion Bridge 2 also 

it cannot be serviced. Normally, the PCI Bus Master read accepts two PCI Bus to Expansion Bridge 1 write 

would propagate to the ARBus, attempt its cycle on the cycles. 

ARBus, and either complete or get an ARTRY*. In either 3. The Processor 2 read of the PCI2PCI bridge causes the 

event, it frees up the bus. For a high-performance 30 bridge to attempt to flush posted write data to main 

architecture, concurrent reads are desired at all times. The memory. Since all buffers are filled in the direction of 

scenario on both PCI buses is that they are stalled with reads PCI Bus 2 t0 PCI Bus 1, and Pa Bus 1 is stalled, the 

heading to memory, but there are write transactions to the PCI2PCI bridge cannot flush its data, 

opposite expansion bridge in each expansion bridge which 7^ pro blem with this scenario is that the two PCI buses 

cannot complete (because the transaction limit has been 35 have conflicting completion orders. Since Expansion Bridge 

reached). Since neither expansion bridge's ARBus master 2 AAck*ed Expansion Bridge l's read, PCI Bus 1 has 

write transactions can complete their address tenure, their committed to completing the read before allowing any other 

respective PCI Bus Master read tenures cannot gain access accesses to occur, thereby stalling the PCI Bus. The Pro- 

to the ARBus to complete or receive an ARTRY*. In this cessor 2 read of the PCI2PCI bridge has kicked off the 

mstance the PCI buses will remain stalled indefinitely. This 40 interlocking behavior of the bridge. The PC12PQ bridge 

is a Type-A LockUp, and is avoided by having an expansion wi ]j Dot service tbe read until it has completed its writes, 

bridge immediately retry PCI Bus master reads if it has a Unfortunately, to complete its write, an access must occur on 

PCI Bus master write transaction queued up in front of it that pq Bus 1 to free up some buffer space. PCI Bus 2 won't 

has not completed. This will ensure that the PQ Bus master service the rea d until it executes the write, and PCI Bus 1 

read has access to tbe ARBus to complete the access or 45 won > t ^ice the write until it completes the read. This is a 

receive an ARTRY*. Type-A (PCI Bus 1 stall) and Type-B LockUp (PCI2PCI 

Refemng to FIG. 17, deadlock may occur when the bridge block). Since neither the PCI Bus 1 stall or the 

ARBus transaction limit is hit, and accesses cannot com- PCI2PCI bridge block can be prevented, the fix is for the 

plete. A typical sequence of transactions is as follows: ARBus arbiter to prevent Expansion Bridge 1 from reading 

1. Three transactions: a) Processor 1 Reads ISA target 50 Expansion Bridge 2 if Expansion Bridge 2 has an outstand- 
behind Expansion Bridge 1, b) Processor 2 Reads target ing read. In general terms, if an expansion bridge has an 
behind Expansion Bridge 2, c) Expansion Bridge 2 outstanding ARBus master's Slave Read, then the ARBus 
Reads target behind Expansion Bridge 1, stalling PCI arbiter should prevent (ARTRY*) another expansion bridge 
Bus 2. from reading that expansion bridge until the outstanding 

2. Meanwhile: a) Expansion Bridge 1 has a write trans- 55 read has completed. 

action destined for Expansion Bridge 2, and b) an ISA The following summary is a compilation of the foregoing 

Bus Master has initiated a read access of main memory rules. Items below in italic text are deadlock avoidance rules 

on the ISA Bus. The ISA bus master cannot be backed for which an expansion bridge is responsible, and items 

off. This ISA bus master access blocks the Processor 1 below in plain text are deadlock avoidance rules for which 

Read from completing. 60 tbe ARBus arbiter or processor bus arbiter is responsible. 

The fundamental problem with this scenario is that the Al. The ARBus arbiter must prevent an ARBus master 

transaction queue depths are limited to three transactions. If from reading main memory (via ARTRY*) if that master has 

the depth were four, then the Expansion Bridge 1 write an outstanding read of an expansion bridge, 

transaction destined for Expansion Bridge 2 could complete, A2. The ARBus arbiter must prevent an access by an 

allowing the ISA bus master read of main memory to 65 ARBus master that would cause a second level cache hit or 

complete, etc. Given that the transaction queue depths are allocate (via ARTRY*) if that master has an outstanding read 

limited to three transactions, the other two problems to note of an expansion bridge. 
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. ART^nc An Address Slave state machine and a PCI Master state 

ma £^ J2^£^ * funher deadly avoidance rule 

T^^T^oT^^ bridge to allow for in similar manner as described prev.ously in relation to the 

™K Snoop Push write transactions. DeadLock Avoid- system arbiter. That is, a deadlock hazard is detected during 
ance Rules for Multiple Expansion Bridges, Split Bus, Fixed 5 which if a deadlocking transaction is detected, that transac- 

Priority Ordered Masters and Slaves: tion is refused. In particular, the Address Slave state machine 

Bl An expansion bridge must disregard ARBus protocol disallows a read of its slave while it has an outstanding 

and take the address tenure if a Bus Grant occurs concurrent master rca d transaction and its corresponding data tenure has 

with an ARtry*. not begun. The PCI Master state machine retries PCI Bus 

B2. An expansion bridge must disregard ARBus protocol M master reads if it has a PC l Bus master write transaction 

and keep its Bus Request asserted the clock following an queuea * up i D front of it that has not completed. 

ARTRY* if it is the source of the ARTRY*. Use c f tne described deadlock avoidance techniques 

B3. An expansion bridge must disallow a read of its slave enables a high-performance split-transaction system bus to 

while it has an outstanding master read transaction and its ^ interfaced to a single-envelope expansion bus without 

corresponding data tenure has not begun. compromising system reliability. Rather than the character-. 

B4. The ARBus arbiter must prevent one master trom rf ^ expansion bus KmMng the performance of the 

reading both expansion bridges b performance of the system bus may be sepa- 

^i^S^T^^^^ * -* — ^ - * 

if it has a PCI Bus master ^transaction queued up m ^ far has assumed a system in which 

Kan tSSwSft an outstanding ARBus boJmasters'and slaves are ordered, .n particular the 60X 

meter's sUe Re ad the ARBus arbiter must prevent microprocessor assumes that its transacts are ordered. As 

TartoyM that Mansion bridT from reading another 25 a consequence, master ordering is to some extent ingrained 
(ARTRY ) that expansion onoge > iron, s uoderlying system architecture. Slave ordenng, 

■^^SSLtEf nit Standing ARBus on the other hand! aftholgh i, may be convenient from an 
. . S !?TST tta ARBus arbiter must prevent implementation perspective, is not required. Increased effi- 

™™ that cienc y ma y ta achieved by re,axing tbe constr ! int of t ve 

£r£ta Ttos. siai.b »e bussed to tbe block 613 iusle.d reoeive. to inputs dl of th. queu ; .nines of .11 of Ito d.vs 

pending. J nese signals are ou^cu^ (instead of just all of the front entries as in FIG. 6). 

of «aosl queue eou.es. _ sudeiecudesd- TTjerefore, if tbe masters aie numbered 0 tbrougb M, the 

F^ttofr^o^s.^foobltrf613d.l«^<M i^^„ -bercd oitoougbS M dlboq»«»loc.lta» 

the bLk ^letec^adlockmg transactions and in 55 

TSMrSi^K"--^-*-*- imp,ementati on inste . d of r^rirtu^ 

hridfe TOO deadlock avoidance is implemented in an memory. In an exemplary embodiment with S-5, M-4, and 

ARBus rn'trol Wocf 7^and in a PCIBus control block 0-2, the number of bits received from me slave queues is 

Stln^SSoSTan Address Master state machine 60 6x6x3-108 bits. Since masters remained ordered, the Aib- 

wJUSSA^PZiA 700 to disregard the Mux 603' continues to receive only the 

Us protocol t ^«™- « ££Z Z£^J&L m Ji 

Sress~ir*^^^ r-^ ltl+1> t^ttfS^53: 

sion bridee 700 to disregard the ARBus protocol and keep its 65 slave queue entries, one of the bits in the expression 

Bus t5£££i clock following an ARTRY if it is HI) isa valid ^^£ m ^ST (S+1+1+1) 

tbe source of the ARTRY. 15 a read/write bit as described previously. 
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Furthermore, the ArbMux 603', instead of receiving only to and including the last slave device, Slave S+l, whose 

a single RDDA signal from each slave, now receives an inputs include RDDA,.s + j )0 , RDDA< a+ ,)„ ... K^Vi) 

RDDA signal for each slave queue entry. In the illustrated (Q4 .ij. 

embodiment, the ArbMux 603' therefore receives (S+1XQ+ i„ piG. 6, a transaction is allowed to proceed only if it is 

l)=6x3-18 bits. 5 q, c frontmost transaction of both the master and the slave. 

In the arbiter of FIG. 6, the ArbMux 603 outputs two bits matching queue location within the slave is by defini- 

(SlvMatcb and SlvRdReady) for each master m the system ^ ^ , he 

The ArbMux 603' of FIG. 19, on the other hand, output two J ^ 

bits for each master for each queue location. Hence, the slave, in me case oi /umviu* » w « , 

^ux 603' outputs 2(M + 1)(Q + 1) bits which are input to 10 function k to identify masters whose next 

the ArbDatSM 604*. In the illustrated embodiment, the order is also .the next »«^^2nG^£ 

ArbMux 603' outputs 2x5x3=30 bits for input to the Arb- device. In the case of the ArbMux 603 of FIG. 19, slave 

DatSM 604' The front queue entries from each of the master ordering is no longer required. Hence, the function ot tne 

queues are input to the ArbDatSM 604' as before. ArbMux 603' is to identify for each master the queue 

The ArbDatSM 604 of FIG. 6 produces two sets of output 15 location within the target slave that matches the frontmost 

signals, DBG and SSD. The DBG output signals remain transaction of the master. The ArbMux 603' also indicates 

unchanged in the case of the ArbDatSM 604'. One DBG whether transaction data for that queue location is ready, 

signal is output for each master for a total of M+l DBG Hence, for each master, two bits, a SlvMatch bit and a 

signals. Instead of outputting out a single SSD signal for SlvRdReady bit, are output for each queue location. In the 
each slave device, however, the ArbDatSM 604' outputs an 20 ca&e of master ^ me bit pairs 0lUpul by the ArbMux 603' 

SSD signal for each queue location within each slave device, afe designatcd Mq q o> M 0 Q a , . . . , M 0 Q (C+ i), and likewise 

for a total of (S+1XQ+1) bits (6x3=18 bits in the illustrated for eacfa succeeding master up lo and including the last 

embodiment). master M rA/+lv the outputs for which are M (A/+1) Q 0 , 

The ArbDatSM 604' receives multiple address coinci- Q M, M »Q rM v lf a master has a valid 

dence (AC) signals from each of the slave devices. In the 25 ^rti^ V £ ^ ^ for the frontm0 st valid 

illustrated embodiment the ArbDatSM 604 receives from trjmsacti ^ s lwM &Xcb signal for that master that corre- 

each slave device a separate signal for every « sponds l0 the matching large t slave queue location will be 

queue entnes within the slave device, indicating whether the P° * transaction in its queue, 

same cache line is the target of both transactions queued asseriea. u ine md» * 

within the pair of queue entries. In general there are Q(Q + 30 then no signal is asserted "at master 

1V2 possible pairs of queue entries within a slave device. The inputs and outputs of ArbMux 603 are illustrated in 

The ArbDatSM 604' therefore receives (S+l)[Q(Q+l)/2] greater detail in FIG. 21 for the case M=4, S-5 and Q=2. 

total address coincidence bits or, in the illustrated Referring to FIG. 22, the inputs and outputs of the 

embodiment, 6x2x3/2=18 bits. The ARtryGen block 613', in ArbDatSM 604' are illustrated in greater detail. The outputs 

addition to the BG and SACK vector inputs previously 35 oflheArbMux 603' described previously are shown as being 

described in relation to the ARtryGen block 613 of FIG. 6, t ^ {ht ^DalSM 604' at a top edge thereof. These 

also receives the same address coincidence signals. fe m u$ed by ^ ^bDatSM 604' to determine which 

In the case of some slave devices, the average latency of fc ^ ^ e(J ^ bus by asserting one of tnc Data 

the slave device may be reduced by reordenn S^ n A ^ Uo ° S r B us Grant signals DBG 0 through DBG„ +1 output by the 
involving the slave device. In the case of DRAM, for 40 ^ ^ m m fc are also ^ by the 

example, page mod* n«*»t*» J- ™JgJ 604' to determine which SSD signal of the target 

reads Hence, in me embodiment of n< j^«^J™ slave is t0 ^ asserted according lo the queue location that 

Z onTslaveX^ t transaction occupies within I slave queue. Which slave 
SS^Ii^^^ a separate signal for 45 ^ the target slave is iden^ 

every possible pair of queue entries within the slave device, queue entries, shown as being input to the ArbDatSM 604 

indicating whether the targets of both transactions queued at a left edge thereof in like manner as in MO. 6. 
within the pair of queue entries are within the same page. The ArbDatSM 604' outputs an SSD signal corresponding 

The ArbDatSM block 604' therefore receives Q(Q+l)/2 total tQ each slave queue location. Hence, for Slave 0, the outputs 
page coincidence bits or, in the illustrated embodiment, 50 of tfae ^DatSM 604' include SSDqq, SSD 01 , . . . , SSD 0 
2x3/2«3 bits. and so forth for each slave up to and including Slave 

Referring now to FIG. 20, the inputs and outputs of the §+1 tfae outpuls for which include SSD (54 . 1)0 , 
ArbMux block 603' are illustrated in greater detail. For each ' 

master M 0 through M ( „ +1) , the ArbMux 603' reives - ^ > ^ ^ ^ 

frontmost queue 55 usedbyT ArbDatSM 604' to ensure that data dependencies 

the master queues to the ArbMux 603 are therefore repre y ^ ^ ^ & ^ optimi2ation „ 

sented as Mo0 0 , M^o. • • • > M(a/+i)Uo- , . f 

In the case of the slave queues, every slave queue entry is described more fully hereinafter, 
input into the ArbMux 603'. Hence, for the slave queue Sq, l n its basic operation, the ArbDatSM 604' performs the 
inputs to the ArbMux 603' include S 0 Q 0 , S 0 Qi, . . . , 60 following functions: 

S 0 Q ( B*i). l* ewise for each slave 1 ueue to s f t ^ uence " p i Determines the highest priority master having a trans- 
to and including the last slave queue S (M) , whose mpute • ^ ^ gQ „ ^ on; 

include S (S+ ,)Q 0 , S„ ^Qi. • • • » ^*iMo*iy ne , a ) tbe SlvMatch bits for all of the masters; 

603' receives from tbe slave devices themselves mdmdual J £ * eue , oca . 

Read Readv signals for each queue location. From Slave 0, 65 °) U1C ICUU/ 11 * ^ 
Therefore the ArbMux 603' receives RDDA 00 , tions of all of the master queues; and 

RDDaI, , RDDAo^,, and likewise for each slave up e) the SlvRdReady bits for all of the masters. 
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2. Asserts the corresponding DBG signal for the winning 
master; and 

3. Asserts the correct SSD signal for the target slave based 
on: 

a) the SlvMatch bits for the winning master; and 

b) the SACK vector in the frontmost queue location of 
the winning master. 

As may be appreciated from the foregoing description, the 
system of FIG. 19 is much more loosely coupled than the 
system of FIG. 6. The loosely-coupled nature of the system 
of FIG. 19 may be taken advantage of to improve the way 
in which deadlocks are avoided. 

As previously described in relation to FIG. 6, slave 
ordering is a major cause of deadlock. When what would 



10 



actions such that they must be executed in order. To prevent 
the transaction in queue entry 1 from being executed before 
the transaction in queue entry 0, the SlvMatch bits of Master 
0 are modified, e.g., changed from 010 to 000. The same 
modification is performed for each arbitration cycle until the 
transaction in queue entry 0 has executed. The address 
coincidence bits for Slave 0 will then be 000. The SlvMatch 
bits of Master 0 then, instead of being modified, remain 010 
such that the transaction in queue entry 1 may be executed 
next if Master 0 is the winning master. 

The ArbDatSM 604' uses the page coincidence (PC) bits 
in a similar manner, not to enforce data dependencies but to 
reduce slave latency and boost system performance. In the 
illustrated embodiment, PC bits are received from DRAM 



ordering IS a majui cause ui u^quwvr. _ muMiaicu ciuLMjuuiJvui, xv- . — ~~ — 

otherwise be a deadlocking transaction is detected, it is 15 on i y j n omer embodiments, PC bits may be received from 
ui-mi^i" u« an ABtrv fiianal Without slave orderine. ^th^r nr aririitinnal slave devices. The slave device is 



killed" by issuing an ARtry signal. Without slave ordering, 
a large proportion of what would otherwise be deadlocking 
transactions, instead of being killed, can now be accepted 
and reordered in relation to other transactions so as to avoid 
deadlock. Such reordering is not possible, however, when a 
data dependency exists. For example, a read of one data 
location by one device followed by a write of the same data 
location by another device does not yield the same result as 
if the execution order is reversed. If a deadlock situation 



other or additional slave devices. The slave device is 
responsible, once a PC bit has been asserted, to keep that PC 
bit asserted until both of the page-coincident transactions 
have been executed (or, more precisely, scheduled for 
20 execution). 

In operation, the ArbDatSM 604' determines to which 
masters the PC bits will be applied, e.g., which masters have 
a DRAM transaction at the front of their queues, in accor- 
dance with the SACK vectors at the head of the master 



II tne execution uruei i» itvwatu. « v. ~. uauuc wiuj o^ww '^^^ «« »*™ — ~ 

cannot be avoided by transaction reordering because of a ^ queues. The PC bits are then used to determine which queue 

i . i j «i i Mmn ; nc - tsx Vill thp HpaHlnr.lrino 1 »:~ tl>» ironcartinnc miPiie_H therein PD 



data dependency, the need remains to kill the deadlocking 
transaction. 

Of course, data dependencies may also exist absent any 
potential deadlock situation. Observing such data dependen- 
cies will not cause any transaction to be killed as in a 
deadlock situation, although it may reduce somewhat the 
utilization of the bus. 

Information regarding data dependencies is input to the 
ArbDatSM 604' in the form of address coincidence (AC) 



locations cannot have the transactions queued therein go 
next without forfeiting the speed advantage to be gained 
from paged access. In practice, if a PC bit is asserted, the 
transactions to which the PC bit relates will be scheduled for 
30 execution prior to any other transactions involving the 
DRAM. In other words, if the DRAM has three transactions 
queued, two of which are to the same page, the execution 
order will be COINCIDENT, COINCIDENT, NON- 
COINCIDENT, instead of NON-COINCIDENT, 



/\rDJJaioiVl m iug iwm vi ouww ~— ~ v / ^vin^ii/cii i, moivuvi ^a. * ' ~* * "~ ~ - ~ ' 

signals from each of the slaves. Using this information, the 35 COINCIDENT, COINCIDENT, although both sequences 
A_urt~*ciiif £fM* c/^tioHuUc tranc art inns as to observe all *,;*\a tti*> camp- cn^pjH Advantage. In other embodiments, any 



ArbDatSM 604' schedules transactions so as to observe all 
data dependencies. For each of slave devices 0 to S+l, the 
ArbDatSM 604' receives Q(Q+l)/2 address coincidence bits. 
In the case of Q=2, for example, the ArbDatSM 604' receives 
three address coincidence bits from each slave: AC 01 , AQ^, 
and AC 12 , each indicating that the two subscripted queue 
locations have target addresses within the same cache line. 

In operation, the ArbDatSM 604' uses the address coin- 
cidence signals as follows: 

1. The ArbDatSM selects for each master a set of address 
coincidence bits from a particular slave in accordance 
with the SACK vectors at the head of the respective 
master queues. 

2. Each selected set of address coincidence bits is used to 
determine for that particular slave device which queue 
location or locations cannot have the transaction 
queued therein go next without violating a data depen- 
dency. 

3. For each master, the SlvMatch bits input to the Arb- 



yield the same speed advantage. In other embodiments, any 
execution order that results in the page-coincident transac- 
tions being executed one after another without any inter- 
vening transaction may be acceptable for purposes of the PC 
40 bits. 

The AC and PC bits may be regarded as control inputs to 
a bit filter that operates upon the SlvMatch bits, as shown in 
FIG. 23. 

The inputs and outputs of ArbDatSM 604* are illustrated 
45 in greater detail in FIG. 24 for the case M-4, S-5 and Q-2. 
Referring to FIG. 25, the inputs and outputs of the 
ARtryGen block 613' are illustrated in greater detail. The 
inputs along the top and left edges of the ARtryGen block 
613' remain unchanged compared to the ARtryGen block 
50 613 of FIG. 6. Unlike the ARtryGen block 613 of FIG. 6, 
however, the ARtryGen block 613', instead of generating 
ARtry based on the assumption of ordered slaves, uses 
certain deadlock address-coincidence (DLAC) inputs 
received at the bottom edge of the block to generate a 
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DatSM are modified in accordance with the results of 55 "qualified" ARtry signal only when a data dependency 



Step 2 to turn off selected SlvMatch bits, if necessary, 
in order to ensure that data dependencies are observed. 
To take a concrete example, assume that the frontmost 
queue entry for Master 0 designates Slave 0. Assume further 
that the SlvMatch bits for Master 0 are 010, indicating that 
the match is for queue entry 1 of Slave 0. Without taking into 
account the address coincidence bits of Slave 0, the trans- 
action in queue entry 1 will be executed if Master 0 is the 
winning master. Now assume that the address coincidence 
bits of Slave 0 are 100, indicating that the transactions 
within queue locations 0 and 1 are directed to the same cache 
line. A data dependency therefore exists between the trans- 



prevents transactions from being reordered so as to avoid the 
deadlock. The slave devices each monitor each system bus 
address tenure and compare the address placed on the bus to 
addresses queued within the respective slave devices. If the 
60 address on the bus is the same as an address already queued 
within the slave device, the slave device raises its DLAC 
signal to the ARtryGen block 613'. All slave devices or only 
selected slave devices (most importantly DRAM) may 
monitor the bus and signal the ARtryGen block 613' in this 
65 manner. In the illustrated embodiment, all slave devices are 
assumed to provide a DLAC signal. The ARtryGen block 
613' therefore receives signals DLAQ, through DLAC (S+1) . 



